Language Model operational metrics
Operational metrics for LanguageModel runtime counters.
All metrics in this file read counters that the LM populates on each
provider call. Routing across inference, reward, and optimizer
phases follows the active op_scope (a contextvar the trainer sets via
synalinks.src.backend.common.op_scope.op_scope).
Class hierarchy:
LMOperationalMetric (base, _phase = "inference")
├── LMRewardsOperationalMetric (_phase = "reward")
└── LMOptimizersOperationalMetric (_phase = "optimizer")
AvgCacheCreationTokensPerCall
Bases: LMOperationalMetric
Average cache-creation tokens per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgCachedTokensPerCall
Bases: LMOperationalMetric
Average cached prompt tokens per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgCostPerCall
Bases: LMOperationalMetric
Average provider cost per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgInputTokensPerCall
Bases: LMOperationalMetric
Average input tokens per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgLatency
Bases: LMOperationalMetric
Average wall-clock latency in seconds per LM call over this run.
Computed as elapsed_s / calls. Because elapsed_s accumulates each
call's own duration, this reports the mean per-call latency regardless of
how many calls ran concurrently -- unlike Throughput, which divides by
the phase's wall-clock span and so does reflect concurrency. The two
coincide (latency = 1 / throughput) only when calls run serially.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerCacheCreationTokensPerCall
Bases: LMOptimizersOperationalMetric
Average cache-creation tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerCachedTokensPerCall
Bases: LMOptimizersOperationalMetric
Average cached prompt tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerCostPerCall
Bases: LMOptimizersOperationalMetric
Average LM-call cost during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerInputTokensPerCall
Bases: LMOptimizersOperationalMetric
Average input tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerLatency
Bases: LMOptimizersOperationalMetric
Average wall-clock latency (s) per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerOutputTokensPerCall
Bases: LMOptimizersOperationalMetric
Average output tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerReasoningTokensPerCall
Bases: LMOptimizersOperationalMetric
Average reasoning tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOptimizerTotalTokensPerCall
Bases: LMOptimizersOperationalMetric
Average total tokens per LM call during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
AvgOutputTokensPerCall
Bases: LMOperationalMetric
Average output tokens per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgReasoningTokensPerCall
Bases: LMOperationalMetric
Average reasoning/thinking tokens per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardCacheCreationTokensPerCall
Bases: LMRewardsOperationalMetric
Average cache-creation tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardCachedTokensPerCall
Bases: LMRewardsOperationalMetric
Average cached prompt tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardCostPerCall
Bases: LMRewardsOperationalMetric
Average LM-call cost during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardInputTokensPerCall
Bases: LMRewardsOperationalMetric
Average input tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardLatency
Bases: LMRewardsOperationalMetric
Average wall-clock latency (s) per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardOutputTokensPerCall
Bases: LMRewardsOperationalMetric
Average output tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardReasoningTokensPerCall
Bases: LMRewardsOperationalMetric
Average reasoning tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgRewardTotalTokensPerCall
Bases: LMRewardsOperationalMetric
Average total tokens per LM call during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
AvgTotalTokensPerCall
Bases: LMOperationalMetric
Average total tokens (input + output) per LM call over this run.
Source code in synalinks/src/metrics/lm_metrics.py
CacheCreationTokens
Bases: LMOperationalMetric
Tokens written to the prompt cache during this run (Anthropic
cache_creation_input_tokens; you pay a higher rate for these).
Source code in synalinks/src/metrics/lm_metrics.py
CacheHitRate
Bases: LMOperationalMetric
Fraction of prompt tokens served from cache: cached / prompt_tokens.
A high value here is one of the biggest cost levers; aim for 0.7+ on a production workload with stable system prompts.
Source code in synalinks/src/metrics/lm_metrics.py
CachedTokens
Bases: LMOperationalMetric
Prompt tokens served from provider-side prompt cache during this run.
For Anthropic this is reported as cache_read_input_tokens; for OpenAI
as cached_tokens. LiteLLM normalizes both into
usage.prompt_tokens_details.cached_tokens.
Source code in synalinks/src/metrics/lm_metrics.py
Cost
Bases: LMOperationalMetric
Cumulated provider cost (USD, as reported by litellm) for this run.
Source code in synalinks/src/metrics/lm_metrics.py
ErrorRate
Bases: LMOperationalMetric
Fraction of LM calls that failed: failed / (succeeded + failed).
The headline reliability signal: successful calls bump calls, failures
bump failed_calls, so the error rate is observable even though failures
leave the token / cost / latency counters untouched.
Source code in synalinks/src/metrics/lm_metrics.py
FailedCalls
Bases: LMOperationalMetric
LM calls that exhausted all retries and ultimately failed this run.
Source code in synalinks/src/metrics/lm_metrics.py
FallbackActivations
Bases: LMOperationalMetric
Times a failed LM call triggered its fallback chain this run.
Source code in synalinks/src/metrics/lm_metrics.py
InputTokens
Bases: LMOperationalMetric
Cumulated input (prompt) tokens consumed during this run.
Source code in synalinks/src/metrics/lm_metrics.py
LMOperationalMetric
Bases: Metric
Base class for LanguageModel runtime-counter metrics.
Subclasses set _phase to one of "inference", "reward", or
"optimizer" to read from the corresponding counter set on each
bound LM. Counters are populated by the LM based on the active
op_scope (contextvar) the trainer sets for each phase.
The metric binds itself automatically to every LanguageModel
reachable from the program (and their .fallback chains) when
program.compile() is called, and counters are summed across all.
Source code in synalinks/src/metrics/lm_metrics.py
LMOptimizersOperationalMetric
Bases: LMOperationalMetric
Base for LM metrics scoped to the optimizer phase.
Reads from each bound LM's optimizer_cumulated_* counters, which
the LM populates while Optimizer.optimize is running (but not
while nested reward computation is in progress — those go to the
rewards bucket).
Source code in synalinks/src/metrics/lm_metrics.py
LMRewardsOperationalMetric
Bases: LMOperationalMetric
Base for LM metrics scoped to the reward-computation phase.
Reads from each bound LM's reward_cumulated_* counters, which the
LM populates while Trainer.compute_reward is running.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerCacheCreationTokens
Bases: LMOptimizersOperationalMetric
Tokens written to the prompt cache during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerCacheHitRate
Bases: LMOptimizersOperationalMetric
Prompt cache hit rate during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerCachedTokens
Bases: LMOptimizersOperationalMetric
Prompt tokens served from cache during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerCost
Bases: LMOptimizersOperationalMetric
Provider cost (USD) of LM calls during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerErrorRate
Bases: LMOptimizersOperationalMetric
Fraction of LM calls that failed during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerFailedCalls
Bases: LMOptimizersOperationalMetric
LM calls that failed during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerFallbackActivations
Bases: LMOptimizersOperationalMetric
Fallback activations triggered during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerInputTokens
Bases: LMOptimizersOperationalMetric
Input (prompt) tokens consumed by LM calls during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerOutputTokens
Bases: LMOptimizersOperationalMetric
Output tokens generated by LM calls during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerReasoningTokenShare
Bases: LMOptimizersOperationalMetric
Reasoning share of completion tokens during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerReasoningTokens
Bases: LMOptimizersOperationalMetric
Reasoning tokens produced during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerThroughput
Bases: LMOptimizersOperationalMetric
LM calls per second (RPS) during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerTokensPerSecond
Bases: LMOptimizersOperationalMetric
Throughput in tokens per second during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OptimizerTotalTokens
Bases: LMOptimizersOperationalMetric
Total tokens consumed by LM calls during the optimizer step.
Source code in synalinks/src/metrics/lm_metrics.py
OutputTokens
Bases: LMOperationalMetric
Cumulated output (completion) tokens generated during this run.
Source code in synalinks/src/metrics/lm_metrics.py
ReasoningTokenShare
Bases: LMOperationalMetric
Fraction of completion tokens spent on reasoning: reasoning_tokens / completion_tokens. Signals whether a thinking model is actually thinking on the workload.
Source code in synalinks/src/metrics/lm_metrics.py
ReasoningTokens
Bases: LMOperationalMetric
Reasoning/thinking tokens produced during this run (Claude extended thinking, OpenAI o-series). Not included in the visible completion content but billed as output tokens.
Source code in synalinks/src/metrics/lm_metrics.py
RewardCacheCreationTokens
Bases: LMRewardsOperationalMetric
Tokens written to the prompt cache during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardCacheHitRate
Bases: LMRewardsOperationalMetric
Prompt cache hit rate during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardCachedTokens
Bases: LMRewardsOperationalMetric
Prompt tokens served from cache during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardCost
Bases: LMRewardsOperationalMetric
Provider cost (USD) of LM calls during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardErrorRate
Bases: LMRewardsOperationalMetric
Fraction of LM calls that failed during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardFailedCalls
Bases: LMRewardsOperationalMetric
LM calls that failed during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardFallbackActivations
Bases: LMRewardsOperationalMetric
Fallback activations triggered during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardInputTokens
Bases: LMRewardsOperationalMetric
Input (prompt) tokens consumed by LM calls during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardOutputTokens
Bases: LMRewardsOperationalMetric
Output tokens generated by LM calls during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardReasoningTokenShare
Bases: LMRewardsOperationalMetric
Reasoning share of completion tokens during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardReasoningTokens
Bases: LMRewardsOperationalMetric
Reasoning tokens produced during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardThroughput
Bases: LMRewardsOperationalMetric
LM calls per second (RPS) during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardTokensPerSecond
Bases: LMRewardsOperationalMetric
Throughput in tokens per second during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
RewardTotalTokens
Bases: LMRewardsOperationalMetric
Total tokens consumed by LM calls during reward computation.
Source code in synalinks/src/metrics/lm_metrics.py
Throughput
Bases: LMOperationalMetric
Throughput in LM calls per second (RPS) over this run.
Source code in synalinks/src/metrics/lm_metrics.py
TokensPerSecond
Bases: LMOperationalMetric
Throughput in total tokens per second over this run.
Source code in synalinks/src/metrics/lm_metrics.py
TotalTokens
Bases: LMOperationalMetric
Cumulated total tokens (input + output) for this run.